Content aggregation in natural language hypertext summarization of OLAP and Data Mining Discoveries

نویسندگان

  • Jacques Robin
  • Eloi L. Favero
چکیده

We present a new approach to paratactic content aggregation in the context of generating hypertext summaries of OLAP and data mining discoveries. Two key properties make this approach innovative and interesting: (1) it encapsulates aggregation inside the sentence planning component, and (2) it relies on a domain independent algorithm working on a data structure that abstracts from lexical and syntactic knowledge. 1 Research context: hypertext executive summary generation for intelligent decision-support In this paper, we present a new approach to content aggregation in Natural Nanguage Generation (NLG). This approach has been developed for the NLG system HYSSOP (HYpertext Summary System of On-line analytical Processing) which summarizes OLAP (On-Line Analytical Processing) and Data Mining discoveries into an hypertext report. HYSSOP is itself part of the Intelligent DecisionSupport System (IDSS) MATRIKS (Multidimensional Analysis and Textual Reporting for Insight Knowledge Search), which aims to provide a comprehensive knowledge discovery environment through seamless integration of data warehousing, OLAP, data mining, expert system and NLG technologies. 1.1 The MATRIKS intelligent decisionsupport system The architecture of MATRIKS is given in Fig. 1. It extends previous cutting-edge environments for Knowledge Discovery in Databases (KDD) such as DBMiner (Han et al. 1997) by the integration of: • a data warehouse hypercube exploration expert system allowing automation and expertise legacy of dimensional data warehouse exploration strategies developed by human data analyst using OLAP queries and data mining tools; • an hypertext executive summary generator reporting data hypercube exploration insights in the most concise and familiar way: a few web pages of natural language. These two extensions allow an IDSS to be used directly by decision makers without constant mediation of a data analyst. 1.2 The HYSSOP natural language hypertext summary generator To our knowledge, the development of HYSSOP is pioneer work in coupling OLAP and data mining with natural language generation, Fig. 2. We view such coupling as a synergetic fit with tremendous potential for a wide range of practical applications. In a nutshell1, while NLG is the only technology able to completely fulfill the reporting needs of OLAP and data mining, 1 See Favero (2000) for further justification for this view, as well as for details on the motivation and technology underlying MATRIKS. these two technologies are reciprocally the only ones able to completely fulfill the content determination needs of a key NLG application sub-class: textual summarization of quantitative data. O L T P D B M u lt i d im e n s io n a l

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using OLAP and Data Mining for Content Planning in Natural Language Generation

We present a new approach to content determination and content organization in the context of natural language generation for quantitative database summaries. Three key properties make our work innovative and interesting: (1) we developed a new text planning approach to deals with the content organization of a data set into a summary report, for example a Data Mining discovery; (2) the approach...

متن کامل

HYSSOP: Natural Language Generation Meets Knowledge Discovery in Databases

In this paper, we present HYSSOP, a system that generates natural language hypertext summaries of insights resulting from a knowledge discovery process. We discuss the synergy between the two technologies underlying HYSSOP: Natural Language Generation (NLG) and Knowledge Discovery in Databases (KDD). We first highlight the advantages of natural language hypertext as a summarization medium for K...

متن کامل

Survey on Opinion Mining and Summarization of User Reviews on Web

Large amount of user generated data is present on web as blogs, reviews tweets, comments etc. This data involve user’s opinion, view, attitude, sentiment towards particular product, topic, event, news etc. Opinion mining (sentiment analysis) is a process of finding users’ opinion from user-generated content. Opinion summarization is useful in feedback analysis, business decision making and reco...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

An Approach for Concept-based Automatic Multi- Document Summarization using Machine Learning

Text Summarization is compressing the source text into a shorter version preserving its information content and overall meaning. It is very complicated for human beings to manually summarize large documents of text. Text summarization plays an important role in the area of natural language processing and text mining. Many approaches use statistics and machine learning techniques to extract sent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000